Genome analysis Exploiting hidden information interleaved in the redundancy of the genetic code without prior knowledge
نویسندگان
چکیده
Motivation: Dozens of studies in recent years have demonstrated that codon usage encodes various aspects related to all stages of gene expression regulation. When relevant high-quality large-scale gene expression data are available, it is possible to statistically infer and model these signals, enabling analysing and engineering gene expression. However, when these data are not available, it is impossible to infer and validate such models. Results: In this current study, we suggest Chimera—an unsupervised computationally efficient approach for exploiting hidden high-dimensional information related to the way gene expression is encoded in the open reading frame (ORF), based solely on the genome of the analysed organism. One version of the approach, named Chimera Average Repetitive Substring (ChimeraARS), estimates the adaptability of an ORF to the intracellular gene expression machinery of a genome (host), by computing its tendency to include long substrings that appear in its coding sequences; the second version, named ChimeraMap, engineers the codons of a protein such that it will include long substrings of codons that appear in the host coding sequences, improving its adaptation to a new host’s gene expression machinery. We demonstrate the applicability of the new approach for analysing and engineering heterologous genes and for analysing endogenous genes. Specifically, focusing on Escherichia coli, we show that it can exploit information that cannot be detected by conventional approaches (e.g. the CAI—Codon Adaptation Index), which only consider single codon distributions; for example, we report correlations of up to 0.67 for the ChimeraARS measure with heterologous gene expression, when the CAI yielded no correlation. Availability and implementation: For non-commercial purposes, the code of the Chimera approach can be downloaded from http://www.cs.tau.ac.il/ tamirtul/Chimera/download.htm. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
منابع مشابه
Exploiting hidden information interleaved in the redundancy of the genetic code without prior knowledge
MOTIVATION Dozens of studies in recent years have demonstrated that codon usage encodes various aspects related to all stages of gene expression regulation. When relevant high-quality large-scale gene expression data are available, it is possible to statistically infer and model these signals, enabling analysing and engineering gene expression. However, when these data are not available, it is ...
متن کاملCoronavirus: Discover the Structure of Global Knowledge, Hidden Patterns & Emerging Events
Background & Objective: The present study aimed at exploring the structure of global knowledge, hidden patterns, and emerging Coronavirus events using co-word techniques. Co-word analysis is one of the most efficient scientific methods to analyze the structure and dynamics of knowledge and the general state of research. Materials & Methods: This applied research performed using Co-word anal...
متن کاملA serial concatenation approach to iterative demodulation and decoding
Iterative demodulation and decoding of convolutionally encoded data is treated as a special case of the recently proposed serial concatenation of interleaved codes. It is shown that by exploiting the recursive nature of the differential modulation schemes (for example, DBPSK, DQPSK, CPM, etc.), large interleaving gains can be achieved similar to serial concatenation schemes. We also show that w...
متن کاملAn Interleaved Configuration of Modified KY Converter with High Conversion Ratio for Renewable Energy Applications; Design, Analysis and Implementation
In this paper, a new high efficiency, high step-up, non-isolated, interleaved DC-DC converter for renewable energy applications is presented. In the suggested topology, two modified step-up KY converters are interleaved to obtain a high conversion ratio without the use of coupled inductors. In comparison with the conventional interleaved DC-DC converters such as boost, buck-boost, SEPIC, ZETA a...
متن کاملThe effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment
The present study was conducted with the aim of the effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment.The purpose of this study is an applied research and a real experimental study. The statistical population of the present study includes all people aged 14 to 16 who are enrolled in ...
متن کامل